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Reverse engineering of gene regulatory networks presents one of the big challenges in systems 
biology. Gene regulatory networks are usually inferred from a set of single-gene over-expressions 
and/or knockout experiments. Functional relationships between genes are retrieved either from 
the steady state gene expressions or from respective time series. We present a novel algorithm 
for gene network reconstruction on the basis of steady-state gene-chip data from over-expression 
experiments. The algorithm is based on a straight forward solution of a linear gene-dynamics 
equation, where experimental data is fed in as a first predictor for the solution. We compare the 
algorithm's performance with the NIR algorithm, both on the well known E.Coli experimental 
data and on in-silico experiments. We show superiority of the proposed algorithm in the number 
of correctly reconstructed links and discuss computational time and robustness. The proposed 
algorithm is not limited by combinatorial explosion problems and can be used in principle for large 
networks of thousands of genes. 

PACS numbers: 



INTRODUCTION 

Prediction of functional relationships between genes, 
starting from actual gene expression data, is one of the 
primary goals of systems biology. Despite large efforts in 
this direction [l|, Q , either based on transcription factor 
- promoter interaction 0, 0], or on inferring gene net- 
works d, @, 0, S, 0| > methods for reliable predictions of 
collective behavior of gene-activity are yet to be found. 
Some general facts about the topology of gene regulatory 
networks 0, 11 , 12| , statistics of gene expressions [l3| or 
the dynamics of gene regulation [5| are becoming to be 
understood. This knowledge is far from sufficient to suc- 
cessfully reconstruct gene networks, but can be helpful in 
limiting the tremendous number of parameters involved 
in reconstruction. Even if the average degree of the gene 
regulatory net work [26||, i.e. the number of genes regu- 
lated by some gene on average, was known, noisy and 
limited data will always lead to severe problems. 

There are basically two types of reverse engineering 
approaches depending on the experimental setup, infer- 
ring the gene network from steady-state [1, 0| or from 
time-series 14, [ll| experiments. By using steady-state 



experiments, one can not draw any conclusion about the 
dynamics of gene regulation. Conducting time-series ex- 
periments gives helpful insights into gene regulatory dy- 
namics, but often with the price of getting redundant 
information. Further, due to costs full time-series data 
on gene expression are in general not available. As de- 
scribed in [l|, one can further divide the reverse engi- 
neering methods into four categories: differential equa- 
tion models [H, @L boolean network models 3, B ayesian 
network models [8( and association networks [l6| . 

How can gene regulatory network reconstruction meth- 



ods be validated and compared? Neither a standardized 
biological benchmark, nor a consensus on what class of 
models to use for in-silico testing exists [l|. The usual 
way is to validate a method either by applying it on a 
given experimental dataset or on in-silico datasets. In 
both cases one has to deal with different problems. Ap- 
plying a method to an experimental dataset, poses the 
problem of comparing the reconstruction result with a 
network which is always just a consensus on how a bio- 
logical network could look like, but never the exact gene 
regulatory network. On the other hand, when applying 
a reconstruction models to in-silico data, one has a per- 
fect reference network, however the generated timeseries 
data is a result of a dynamical model of gene interaction, 
which cannot be shown to overlap with the real gene 
regulation dynamics. We decided to selected an E.Coli 
dataset [!j as a satisfactory biological validation of our 
algorithm, because the underlying SOS response network 
has been subject to over 30 years of research, which pro- 
vides us with a reasonable consensus about the actual 
gene regulations going on in that particular sub-network. 
For in-silico validation, we used a gene regulation model 
proposed in [l7j |. This model simultaneously captures a 
series of experimental facts, such as the distribution of 
genome wide gene-expression levels , multi-stability and 
periodicity. 

In the following we introduce a novel reverse engineer- 
ing algorithm and compare it with Network identifica- 
tion by multiple Regression (NIR) 9]. NIR has been ap- 
plied to the same SOS response network of E. Coli, which 
makes it possible to compare the two algorithms. Fur- 
ther NIR is considered as the state of the art algorithm 
which has so-far not been outperformed in the quality of 
reconstruction. There are faster algorithms than NIR, 
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which suffers from a binomial explosion problem, and is 
thus limited to relativel y sm all networks. One fast algo- 
rithm was presented by [181 ] , another recent algorithm [8( 
claims the ability to reconstruct links with better statis- 
tical significance. 

The idea of the NIR algorithm is to reconstruct the net- 
work by using a least-squares regression approach, where 
RNA concentrations are regressors and the external per- 
turbation is the dependent variable. NIR enforces the 
same number of regulatory links to every gene, which is 
clearly unrealistic. The ensemble of links which provides 
the least squared error is selected as the optimal solution. 
For a given average degree (k) , the least-squares error is 
calculated for all the {fL) combinations, where N is the 
size of the network. This becomes a combinatorial prob- 
lem for large networks. 



RECONSTRUCTION METHOD 

A system of interacting genes can be seen as a complex 
network, where every directed link represents a functional 
relationship between two genes. For simplicity, let us as- 
sume that this link will contain both transcriptional and 
translation levels of gene interactions. In this oversimpli- 
fied view one can assume that the gene expression level 
changes in time as 



dX 



g(x(t)) 



(i) 



where g(X) is an a priori unknown function of a time 
dependent vector of gene expression levels X. If there 
are N genes in the (sub)network under study (e.g. N 
genes on a custom chip), vector X has N components. 
If we assume, as in [14j . that g(X(t)) is a linear function 
(or after linearization of a more complicated function) 
one can write Equation {!]) as 



dX_ 
~dt 



= AX 



(2) 



where is a vector of gene over-expressions and A is a 
constant adjacency matrix, containing the "strength" of 
gene-gene interaction. The elements A^ can be positive 
or negative real numbers, indicating activating or inhibit- 
ing interactions, respectively. By solving Equation @ 
one formally gets 



X»(t) = e At X° + A- 1 (e At - i> 



(3) 



Where superscript fj, indicates the system was perturbed 
with the constant vector /i. After M over-expressions, 
one can write the above equation in matrix form 



X = e At X +A- 1 (e At -/)/} 



(4) 



where X is the collection of all gene expression levels af- 
ter the M over-expressions experiments, organized in a 



N x M matrix, where one of the M columns is a time 
dependent iV-vector of gene expression levels for differ- 
ent gene being over-expressed. In the following let us 
assume that we are able to perform N over-expression 
experiments, i.e. M — N . ft is a N x N diagonal ma- 
trix of gene over-expression levels, fx is diagonal, because 
in every over-expression experiment just one gene is be- 
ing over-expressed (which is the experimentally feasible 
case). At this point we emphasize that even though we 
know from the way over-expression experiments are pre- 
pared that the matrix /t is diagonal, one often has little to 
no experimental control about the exact amplitude of its 
entries. This problem is mitigated for small times t< 1. 
To see this we define 



= i(*(t) - e M X Q )jjT x 



(5) 



Using this definition and abbreviating A = At Equation 
(HI) can be rewritten as 

A = In (7 + AQ) . (6) 

It is easy to check that in the short time limit 



lim Q 

t->-o 



(7) 



holds. For very short times t our lack of knowledge is 
thus basically irrelevant and estimating AQ reduces to 
estimating A. It is also not hard to realize that for sparse 
adjacency matrices A the relative response 



Y 



x? 



(8) 



will provide a good first estimate, i.e. A oc Y , where 
X® is the gene expression level of the ith gene when no 
perturbation has occurred (/i = 0) and Xf the gene ex- 
pression level of the ith gene, where the jth gene has been 
over-expressed. Moreover, linearity assures that relative 
responses for short times will be small, \Yij\ <C 1. 

However, for times in the order of a cell-cycle t ~ 1 
and less sparse matrices these estimates will not be suffi- 
cient. The idea is to replace Yij by some function Dij = 
f(Yij) which has the properties that (i) f(Yij) ~ Yij for 
\Yij\ -C 1 and (ii) / is a monotonously increasing func- 
tion. Since in practice Y^ can range over many decades 
in amplitude we also presume that (iii) / should be a con- 
cave function. Lastly, (iv) / has to be defined on [— 1 , oo] 
since — 1 < Yy, but in principle could be arbitrary large 
for positive values. Maybe the simplest function fulfilling 
this requirements is the logarithm, thus we can estimate 
AQ = D for f(x) = ln(l + x), i.e 



= In 




(9) 



This means that we effectively estimate A = ln(7 + D), 
where I is the identity matrix. For the matrix loga- 
rithm to provide unique solutions, I + D should not have 
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any negative real eigenvalues. Since experimental results 
show that this is not the case in general we use a cleaned 
version (see below) of D, denoted by D° such that I + D° 
has no negative real eigenvalues and the prediction of the 
adjacency matrix is given by 



A = In (I + L>° 



Eigenvalue cleaning 



(10) 



In general, the logarithm of a matrix can have an in- 
finite number of real and complex solutions. In order to 
find a unique solution of ln(D + I), matrix D + I can 
not have negative real eigenvalues. If we take a look 
at the eigenspectrum of matrix D from various experi- 
ments, both biological and in-silico, we notice that most 
of the eigenvalues are complex, however a small number 
of eigenvalues are real, both positive and negative. Thus, 
we first have to clean the matrix D + I, meaning to set 
all the negative eigenvalues to small positive number e. 
This is done by first diagonalizing matrix D: 



D = U^DU = diag(di, d N ) 



(11) 



where the eigenvalues are ordered in a way that first L 
eigenvalues are real and less then -1, d* = di < — l,Vi < 
L. These L diagonal elements are set to e — 1 



D< 



I = 



(e 
e 





\ 









(12) 



\0 ... ... d N J 

and are rotated to yield the cleaned matrix 
D° = UD'lf- 1 . 



(13) 



Matrix D° + I no longer has negative real eigenvalues, 
and a unique prediction of an adjacency matrix A - re- 
constructed gene regulatory network - can be given 



A = ln(D° + /). 



Thresholding 



(14) 



Our solution A will in general represent a fully con- 
nected network, with a certain distribution of link 
weights around zero. The reason why we are always 
getting fully connected network, e.g. network without 
zero entries in adjacency matrix, is because of the noisy 
measurements. Real gene regulatory networks are never 
fully connected, but are characterized by an average de- 
gree (k), which has been estimated to be relatively small 



^2 — 4 [12j. For simplicity we assume (k) for the undi- 
rected unweighted case. Knowledge of (k) allows to de- 
fine a clear thresholding scheme. All entries in A below a 
threshold a are set to zero, a is chosen such, that matrix 
A®j has the average degree (fc), i.e. 



Al^AiMAiA-*), 



such that 



N 



££*(K-l) = <*>. 



(15) 



(16) 



A is the first approximation of the gene regulatory net- 
work we want to reconstruct. 



A note on fewer than N experiments 

In the case where the number of over-expression exper- 
iments M is lower than the size of the network N, ma- 
trix D is not quadratic, thus we are unable to calculate 
the matrix logarithm. Information about the influence 
of gene j (j > M) on gene i is missing. A way around 
is that one can introduce a measure of the distance be- 
tween two genes in the network. Although the correla- 
tion between gene expressions in different over-expression 
measurements can not lead to any conclusion about the 
functional relations among genes, it can provide a good 
measure for the distance between the genes in the net- 
work. One can therefore simply calculate a matrix of 
correlation coefficients and replace the missing terms in 
D: 



Dij = 



(17) 

Here the first index in D^j, i runs in the domain M < 
i < N, the second index, 1 < j < N. 



TESTING THE METHOD 

We compare our results with the NIR algorithm both 
on an in-silico dataset, as well as on the E.Coli SOS re- 
sponse network 19|, 20, 21, 22, 23|, in the same way as 
in We measure performance in two ways, firstly, by 
counting the fraction of correctly reconstructed positive, 
negative and zero links, denoted by F+, F- and Fq, re- 
spectively. For later use we define F = F + + F_ + F . 
Secondly, by calculating the extended Matthews correla- 
tion coefficient (24[, a discrete version of Pearson's cor- 
relation coefficient, extrapolated onto K x K confusion 
matrices. Matthews correlation coefficient is taking val- 
ues in the interval [— 1 , 1] , where stands for no corre- 
lation between predicted and real case, and 1 and —1 
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stands for complete or negative correlation respectively. 
The -FT-category correlation coefficient is denned as 



R K = 



^klmiCkkClm—CklCmk) 



y/I2k(£i c fci)(Ez'fe'# c fc'i')\/5Z*d c ik)(J2i'k'^k c i'k i ) ' 

(18) 

where C is a K x K confusion matrix, or more precisely 
the element Cm is counting the number of cases where 
category k is predicted, but category I was present. In our 
case, K = 3 and the categories are: positive link, negative 
link and no link between any two genes. It is straight 
forward to see that p- values for any value of R K can be 
computed exactly in the same way as for the Pearson 
correlation coefficient, provided sample size is given. 

It is important to stress the difference in measuring re- 
construction performance in in-silico and biological ex- 
periments. While in biological networks, self-regulation 
is a part of the complete gene regulatory network, in 
numerically simulated gene regulation dynamics, self- 
regulation is often screened by negative self-degradation 
rates, which have to be imposed, in order to keep the 
dynamics sufficiently stable, see e.g. |I7|. To be as cor- 



rect and conservative as possible, we therefore compare 
our reconstructed adjacency matrix only with the off- 
diagonal elements in the in-silico case. In the E.Coli 
case we of course compare with the complete adjacency 
matrix. 



In-silico testing 

We employ a recently proposed dynamical gene-gene 
interaction model, which is able to capture a series of 
experimental facts on gene-expression statistics fl7T |: (i) 
distribution of gene-expression increments over time, (ii) 
multiple equilibria, (iii) stability. The model is defined 
as 



dt 



Xi {t) = Y^J^(*M-$) + m(*i®-$)+rh . 

(19) 

with a positivity condition imposed for gene expression 
levels (non negativity of concentrations): 



Xi(t) >0 Vi 



(20) 



Here, A model is a real valued adjacency matrix of gene- 
gene interactions. It is modeled as a particular random 
matrix, mimicking experimentally known facts 17J. x{t) 
is a vector of gene-expression levels in time t, constant 
vector x° indicates steady state gene-expression levels. £ 
and X] are multiplicative and additive noise terms, respec- 
tively, which are a generic feature in chemical reactions. 
Using the dynamics defined in Equation (|19|) we generate 
the time series of gene expression levels x(t), and simulate 
the effects of perturbation by adding a constant pertur- 
bation vector to the Equation (fTl?|) . For details, see 17 1. 



We measure the gene expression levels as time averages 
over concentrations: X? = f. 2 Xi(t)dt and X 3 - = 

— L* Xi(t)dt, where t < t\ < < 2 < t p < t 3 < U. 

i 4 — i 3 3 

to is the initial time point of the simulation (after dis- 
counting transient behavior), t p is the time at which the 
perturbation vector (with the jth component being non 
zero) is applied. The procedure is depicted in Figure [TJ 



Testing on the E. Coli dataset 

We use the wild-type E.Coli strain MG1655 available 
at [9( . The reason for testing our method on this particu- 
lar dataset is the fact that the SOS response of the E. Coli 
is well understood, and some consensus over the topol- 
ogy of its gene regulatory network is reached. Moreover it 
is possible to compare reconstruction success with other 
groups [1, [3, HH]. We test the performance by count- 
ing the fraction of the correctly reconstructed links of all 
three classes (positive, negative and zero), and with the 
extended Matthews correlation coefficient. 



The pure-chance reconstruction threshold 

A strong criterion of checking the performance of any 
reconstruction method we consider, is to compare it 
with a pure random-reconstruction. Several proposed 
gene network reconstruction algorithms can be shown to 
perform only slightly above pure-chance reconstruction. 
Random reconstruction can be performed in the following 
way. Suppose that (k) denotes the true average degree 
of the network, which may or may not be known, and 
kg denotes a guess on (fc). Since we estimate that the 
directed network has L = Nk g links we take a fully con- 
nected network and assign a random order to all N(N—1) 
links. Then we take a random number with three out- 
comes: + (positive weight), — (negative weight), and 
(no link), and assume that there are as many positive 
as negative links. The distribution of these outcomes 
therefore is such that both + and — occur with proba- 
bility w± — k g /2N , while the appears with probability 
wo = 1 — kg/N. The true probabilities, i.e. the probabil- 
ity of +, — , if the true average (k) was known, however 
are, p± = (k) /2N and po = 1 — (k) /N. Now we pick 
one link after another in the given random order, and as- 
sign a random symbol, +, — or and repeat this until L 
links have been assigned either + or — . Since 'throwing 
the dice' is an event independent of the network topol- 
ogy, one can simply compute F± and (k g | (k)) = w±p± and 
F rand (k\(k))=w oPo . 



If reconstruction is based on pure chance the expected 
/iT-category correlation will be R K = 0. This can be seen 
by inserting the confusion matrix Cy = WiPj, i and j 
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indexing +,— or 0, into Equation (JT8J) - 



RESULTS 
Reconstruction on in-silico data 

We generated three different networks (N = 10) with 
different connectivities ((k) £ {1,3,5}), for purposes of 
in-silico testing of our reconstruction algorithm. Using 
a fixed adjacency matrices A model of these networks, 
we simulated time series of gene expression levels (see 
Figure [J) according to Equation (fl9|) , with noise levels 
a = a = 0.1, where & S N(0,a) and ^ G N(0,a). For 
details [l?!- As described in previous section, we mea- 
sured the steady state gene expression levels before and 
after the perturbation of each gene in the network, de- 
noted by Xf and X\ , respectively. The so generated data 
was taken as an input for both reconstruction methods. 
In this case the exact value of the over-expression vector fj, 
was used as an extra input parameter for the NIR recon- 
struction. In reality this exact value remains unknown. 
Results were produced for 20 statistically identical real- 
izations of networks for every connectivity (k) € {1,3,5}. 
All the networks provided very similar results, only one 
for every connectivity is shown in Figure [2l Here we com- 
pare the results of our reconstruction method with the 
NIR algorithm for in-silico experiments. The left panel 
of the figure shows the fraction of correctly reconstructed 
links, for every link type (F + , F_ and i*o) as well as their 
sum F. The colors blue and green represent the NIR 
and the proposed method, respectively. The pure-chance 
threshold is shown to emphasize the significance of the 
result. The right panel shows the extended Matthews 
correlation coefficient. For the Matthews correlation co- 
efficient the pure-chance threshold is constant at zero. It 
is clearly seen that for the fraction of correctly recon- 
structed links our method performs about equally well 
than the NIR for very sparse networks {(k) model = 1) and 
outperforms it in in more densely connected networks. 
There, when looking at the fractions of correctly recon- 
structed links one notices a slightly better performance 
of our algorithm, while for the extended Matthews corre- 
lation coefficient the difference is much more notable. To 
understand this difference, one has to take a closer look 
at the type I and type II errors of both methods. While 
the NIR algorithm makes almost the same number of re- 
construction errors of all types, there is a clear distinction 
in errors made by our reconstruction algorithm. The vast 
majority of errors are made by assuming that there is a 
link (positive or negative) between two genes, while in 
the real case there is none, and vice versa. Only a few 
mistakes are made where the real positive link is recon- 
structed as negative, or vice versa. This is an additional 
asset of the proposed reconstruction algorithm. 



Reconstruction of the E. Coli SOS network 

Although our reconstruction method showed better re- 
sults tested on in-silico networks than NIR, the true value 
of any reconstruction potential can be shown just on 
the real biological data. When testing both methods on 
E. Coli data, as shown in Figure [3] , our reconstruction 
method outperforms the NIR more visibly, in both per- 
formance measures. To stress the difference in the quality 
of reconstruction we present p- values of given correlation 
coefficients between the real and reconstructed networks 
are. Given the sample size K = 81, i.e. the number 
of links to be reconstructed, and a (k) — 4 (known ex- 
perimental value), the p- value of correlation coefficient 
B? NIR = 0.14 for NIR is p N in. = 0.2126, while the p- 
value of correlation coefficient R 3 = 0.4 for our method 
is p = 0.002. For R 3 values see Figure (0), at (k) = 4. 
Our reconstruction leads to a network which significantly 
correlates better with the experimentally known biologi- 
cal network. 

One can easily notice that both reconstruction meth- 
ods applied on in-silico data have their maxima in perfor- 
mance when the input average degree equals to the true 
one, (k) = (k)' model , which can be seen as an additional 
consistency check of the algorithm. On the other hand, 
after applying both reconstruction methods on E.Coli 
data, just the proposed reconstruction algorithm shows 
its performance maximum at the (k) = (k) E - coh point, 
while the NIR method shows similarities in behavior to 
the pure-chance reconstruction. 

The computational time needed to perform the NIR 
algorithm on this particular 9 node network is of order 
of magnitude of 1 minute, while our approach takes less 
than a second, both performed on a standard personal 
computer. The NIR algorithm is unable to cope with 
reconstruction of significantly larger gene regulatory net- 
works, both from the time or memory consumption, while 
our method can deal with network sizes of up to realistic 
genomes. 

Because of typically high levels of noise and uncer- 
tainty in biological data collected throughout actual ex- 
periments, the robustness of a method is of crucial im- 
portance. We tested both the NIR and our algorithm in 
the following way: in the in-silico experiments we gener- 
ated a data matrix X\ with N genes and N experiments. 
In this matrix we replace 2 randomly chosen columns by 
random data (iid Gaussian entries with unit variance). 
This matrix we call Xi and reconstruct networks A\ and 
A2 from X\ and X2, respectively. By comparing these 
two reconstructed networks, in the case for NIR recon- 
struction we find strong tendency of all links to change 
their position. In the proposed method links preferably 
change at the positions of the replaced columns. 
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DISCUSSION 

We introduced a reverse engineering procedure for gene 
regulatory networks, applicable on an experimental setup 
where all the genes belonging to a genetic (sub)network 
are being over-expressed one after the other, after which 
gene-chip measurements in the steady state are taken. 
We showed the reconstruction performance on both in- 
silico and biological data. The method is applicable to 
large networks, both from the computational memory or 
computational time point of view, which might be a prob- 
lem for algorithms limited by combinatorial explosions. 

Except from technical benefits, the philosophy of our 
reconstructing method complies perfectly with the bio- 
logical goals of conducting over-expression experiments. 
In contrary to the NIR algorithm or similar reconstruc- 
tion methods, where the final solution is a network, where 
every link has same significance, our method ranks the 
reconstructed links by their influence, which might be a 
very important issue in experimental gene interaction- 
detection Instead of randomly picking the links out 
of a given reconstructed topology, here one can select 
interaction-links with the highest weights. This again 
ameliorates the consequences of not knowing the real net- 
work connectivity (k) a priori. While selecting a good 
value for (k) is crucial for getting reliable networks, it 
will not influence the ordering of the links by importance 
in the proposed algorithm. In other words, no matter 
which (k) is taken, the set of ranking of reliable links will 
not change. 

Another shortcoming of the NIR algorithm is the fact 
that the resulting network has a trivial, unrealistic degree 
distribution, a delta function, 6(k — k*). Thus, detecting 
genetic hubs, peripheral genes, or any other topologically 
important genes in the network is practically impossible. 
The proposed method does not a priori restrict the topol- 
ogy of the reconstructed network except for the average 
degree (k) which is important for the thresholding only. 

The NIR algorithm needs as an input parameter for the 
successful reconstruction information external perturba- 
tion, which is in most of the cases just approximately 
known. In the in-silico experiments we have provided 
the exact information for NIR, however the NIR algo- 
rithm was still outperformed. 

Supported by WWTF LSI 29 and Austrian Science 
Fund FWF Project P19132. 
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FIG. 1: Time series of three randomly selected trajectories (numerical solutions of Eq. US}), showing the measurements of 
gene expression levels in in-silico over-expression experiment. Gene expression levels were measured from time t\ until ti for 
the steady state levels, and from time t% until t± for the effect of perturbation. At the time t p one gene was over-expressed. 
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FIG. 2: Results of network reconstruction for the proposed algorithm (green lines) and NIR (blue lines) for in-silico experiments. 
The results both for the fraction of correctly reconstructed links ((a)-(c)) and extended Matthews correlation coefficient ((d)- 
(f)) are shown. Three in-silico networks with different average degree were constructed, for (k) equals l((a),(b)), 3((b),(e)) and 
5((c),(f)). In the plots where the fraction of correctly reconstructed links are shown, circles denote the fraction of positive links 
F+, squares the fraction of negative links F- and triangles no links Fo. The red line represents the gambling threshold p ranc \ 
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FIG. 3: Reconstruction results for the proposed algorithm (green lines) and NIR (blue lines) for E.Coli. The results both for the 
fraction of correctly reconstructed links (a) and extended Matthews correlation coefficient (b) are shown. In (a) circles denote 
the fraction of positive links F+, squares the fraction of negative links F- and triangles no links Jo- The red line represents 
the gambling threshold F rand . 



